A robust Bayesian formulation of the optimal phase measurement problem 
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Optical phase measurement is a simple example of a quantum-limited measurement problem 
with important applications in metrology such as gravitational wave detection. The formulation of 
optimal strategies for such measurements is an important test-bed for the development of robust 
statistical methods for instrument evaluation. However, the class of possible distributions exhibits 
extreme pathologies not commonly encountered in conventional statistical analysis. To overcome 
these difficulties we reformulate the basic variational problem of optimal phase measurement within 
a Bayesian paradigm and employ the Shannon information as a robust figure of merit. Single-mode 
performance bounds are discussed, and we invoke a general theorem that reduces the problem of 
finding the multi-mode performance bounds to the bounding of a single integral, without need of 
the central limit theorem. 
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Quantum limits to optical phase-shift measurement 
are important in diverse areas; from the design of gravity 
wave detectors, to telecommunication, and optical fibre 
sensing. For the measured datum (f> e [— tt, +7r], and an 
unknown "true phase" <I>, the problem is to achieve an 
optimal statistical design for the phase detection curve 
p{(j)\^) which relates them (under a cost constraint such 
as fixed average photon number, N). 

The standard interferometric performance limits are 
well-known[2]: the shot-noise limit, Acj) < I/a/TV, for 
coherent state inputs; and Acj) < 1/N, for optimized 
squeezed state inputs[3]. However, under the stimulus 
provided by Shapiro, Shepard and Wong's [4] suggestion 
of a possible 0(l/iV^) scheme, it has become important 
to find a robust measure of optimality that copes with 
statistical pathologies [5-9]. 

To paraphrase the overall problem, we may classify 
three basic tasks: 1) determine how to describe phase 
measurements; 2) determine how to prepare particular 
states — and implement the desired measurements; and 3) 
determine, in company with the above, the best scheme 
under some chosen optimality criterion. 

This letter concerns the last item, and so we pick a gen- 
eral theoretical setting due to Shapiro and Shepard[10] 
that best illustrates the difficulties. 

Measurement is here described using the theory of 
probability operator measures[ll\. The classical data, say 
(j), and the measured quantum state, say /5('&), are then 
related by the conditional probability rule 



p(0|p(<l>))d0 = tr[p($)n(</))]d0. 



(1) 



where tl{4>) is a family of positive hermitian operators 
which respects the closure constraint J2<f,^i't') ~ 1' 

that J2^Pi4>\p) — 1- The 11(0) need not be projectors 
(nor orthonormal, if they were). 

For optical phase, Shapiro and Shepard[10] imagine 
a scheme where an ingoing probe state is phase-shifted 



by e and then subjected to an idealized Susskind- 
Glogower measurement[12]. For the rule (1) they obtain 
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(2) 



where the ipn £^re number-ket coefficients of the probe 
beam (to be optimized) . Although there is no known way 
to implement the SG-measurements, they are thought to 
be optimal[13] (and the derived statistics agree with the 
Pegg-Barnett hermitian phase operator approach[14]). 

Interestingly, given some p{(j)\^), (2) can be "in- 
verted" to find a corresponding minimum average energy 
stateflO]. Any kind of statistical behaviour is possible 
and one must select a criterion that excludes patholo- 
gies. For instance, the SSW-state[4] {tpn — -\/6/7r(l -f n), 
for n < M{N), or zero, with M{N) chosen so N is the 
mean photon number) is strictly optimal by reciprocal 
peak likelihood, but has been shown to be sub-optimal 
using other criteria for both single-mode [5-7] and multi- 
mode [8] detection strategies. 

One of the characteristic problems encountered in such 
studies is to adequately evaluate the utility of a sharp 
central peak sitting upon a broad tail. It is this kind 
of pathology that the SSW-state possesses. Measures 
such as peak likelihood, and rms-phase error bias one 
or other of these elements to a greater or lesser degree. 
As Hall has argued [6], there are at least two good can- 
didates, the use of confidence intervals, or the Shannon 
information (Fisher information [8] is another possibility 
for the analysis of multi-mode schemes) . In a recent pa- 
per, Bialynicki-Birula et al.[9] reported a numerical op- 
timization of the single-mode problem for five different 
criteria. They found that the Shannon information oc- 
cupied the "middle ground" among these. It seems not 
to place undue emphasis upon either "peaks" or "tails" , 
which is desirable to fix a robust variational problem that 
rejects false solutions. Indeed Shannon information can 
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exclude even the most extreme pathologies, such as a 
singular peak atop a broad tail [7] . 

With the goal of robustness in mind we reformulate the 
general multi mode optimization problem in information 
theoretic terms. A secondary purpose will be to show 
how the Bayesian methodology fits easily with entropic 
measures of uncertainty, and quantum mechanics [15]. 

The important feature of (1) is that we can fix upon 
a p, and imagine a class of all possible tlicf)), or vice- 
versa. Some choice returns a p{(j)\p). This rule is read 
as p(data|state), and we see that a "good" instrument 
must closely correlate particular data (i.e. the observed 
readings) with some particular state (i.e. that we wish 
to know). Ideally, we seek a delta function correlation, 
but in general it will be more fuzzy due to the effects of 
quantum and classical noise. 

In the Bayesian viewpoint[16] one looks upon this link 
as being reflexive, i.e. we seek to find p(state|data). To 
do this one must introduce a prior probability, po (state), 
for the as yet unknown states. Then we use Bayes' rule 
of conditionals: 



p(data, state) = p (data |state)f>o (state), 
to perform the "statistical inversion" 



p(state|data) 



p(data| state)po (state) 
g,.g^^gP(data|state)po (state) ' 



(3) 



(4) 



In general, two problems arise. We may not know what 
p(data|state) is, or we may not have a good way to single 
out a prior distribution po (state) [17]. 

In quantum theory, the situation is better than one 
might first expect [15]. Now, unlike in classical statistics, 
we can engineer a particular p(data|state). It is subject 
to control, and design (as evidenced by the optimal phase 
measurement problem) . Secondly, the space of states is a 
physical space upon which physical symmetry principles 
can be brought to bear to fix the Laplacian notion of a 
priori complete ignorance. For optical phase the answer 
is obvious. We choose po(0) = l/27r, that unique function 
invariant under phase changes M- i/i + (50 (an example 
of a general principle advocated by Jaynes[18]). 

Now it remains to quantify optimality. In general, we 
must place a figure of merit upon p(state, data), the joint 
correlation between states and data. Significantly, it is 
not merely p(data|state) that matters. For instance, one 
can imagine an instrument that was very accurate for 
some states, and poor for others. Optimal measurement 
is thus a notion defined relative to those situations we 
expect to encounter in practice. 

In the optimal design problem we must look, therefore, 
for a figure of merit defined upon p(state, data), with 
Po (state) chosen to reflect our design intentions. 

The standard measure of covariance, based upon an 
analysis of variance, is the obvious choice. However, to 



ensure a robust solution we will employ the mutual in- 
formation[19] 



(AX) 



p(data, state) log2 

state data 



p(data, state) 



p(data)p(state) 



(5) 

of communication theory. This quantity is non -negative, 
and zero if, and only if, the distributions are statistically 
independent (an uninformative measurement) [15]. 

Further, one has an obvious communication theoretic 
analogy. The above measure is the average number of 
bits that could be sent if we encoded messages in a set 
of physical states that are sent with probability po (state) 
(in practice a relative frequuency) . Here it measures the 
information gained from data about the state, for an in- 
strument whose performance is assessed on an imaginary 
ensemble of states distributed according to po (state). 

Now we apply (5) to the optical phase measurement 
problem. Going back to (2) one may think of ip{^) as 
the "information carrier" , a phase modulated signal, and 
set po(^) = l/27r, so that all phase-shifts are equally 
likely a priori. Then the X^stato becomes integration 
with $ e [— tTjTt], and similarly for the X^dat^- 

Prom (2) we obtain the multi-mode correlation 



m I 
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n=0 



(6) 



Choosing the uniform prior po(^) = I/^tt, we apply 
Bayes' rule (3) to obtain 



P(0: 



= / d^p{(j)i,...,(l)m\^)po 

J —IT 



n=0 



Then, using (4), we have 



(/.i,...,0„) = — — — . (7) 

2np[(j)i,. . .,(t>m) 

Using (3) once more, we substitute this into (5), and 
rearrange to obtain 

/TT p7T 
(.".)/ d<t>i...d<t>^p{<t>i,...,4>m) 
-TT J —TT 



d^p{^\<Pu- ■■,4>m) l0g2 (27rp($|,^i, . . . , Ct>m)) (8) 



as the gain in bits for a multi-mode measurement on 
m identical pulses yielding the data (f)i, . . . ,(t)m- In 
this problem one must optimize, cojointly, the chosen 
ipn, and the number of pulses m, subject to the to- 
tal average photon number constraint N = mfi, where 
n = (a^a)singicmodo, is the average photon number per 



mode (see Lane et al.[8]). 
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In the special case of a single-mode we set 6' = ^ — <J>, 
and introduce the new function f{0) = 27rp(0|$), where 
f{6) = f{—0), Then (8) assumes the simple form: 

{AI{N))= f fN{0)log2fN{e)d0/2w, (9) 

where N is the mean photon-number of the single mode, 
and our interest lies in the regime N ^ 1, for parametric 
families of probe states ^/jn{N). 

Elsewhere[7], we used (9) to reconsider the three trial 
states of Shapiro. Shcpard and Wong's paper [4]. In their 
naming scheme, we get: 

(AIsswW) < 0.966, (10) 
(AXcs(iV)) - 1/2 log2iV- 1.604, (11) 
(A2ts(A^)) ~ log2 A/' - 0.220. (12) 

Whereas the coherent state (CS) and truncated phase 
state (TS) return unbounded information gain, we may 
expend infinite photon energy and recover no more that 
one bit from the SSW-state. 

This marked sub-optimal behaviour may be traced to 
the large-A^ vanishing peak-area property of the SSW- 
state noted by Schleich et al.[5], or to its well-known 
extended tails [4]. In yet another view, one can employ 
a simple scaling argument to explain this unusual finite- 
gain boundedncss[7]. 

Although this example shows that (5) is a robust cri- 
terion, the optimization is now more difficult. In non- 
linear problems of this kind, the simplest line of attack 
is to seek an upper bound. For the given single-mode 
example, Hall[6] has done this by adapting an entropic 
uncertainty relation[20], to obtain the inequality: 

(AI(7V)) <log2(7V + l) + iVlog2(l + l/7V) (13) 

Comparing this with (12), Hall notes[6] that the 
truncated-phase (discrete-phase) states are within 1.220 
bits of the theoretical optimum. To interpret the physi- 
cal meaning of such pure numbers we consider a typical 
asymptotic gain of the form [7] 

{AI{N)) ~ log2 N + 13 = \og^{2^N), 

where /3 < = 1 (from (13)). Define A/3 = Pop - P, 
and it becomes clear that A'' = 2^^ Nop, is the energy- 
expenditure conversion factor at fixed information gain. 
Since 2^^ « 2^-'^'^° = 2.329, (12) is twice as expensive 
as the optimal strategy (and there are a number of can- 
didates with similar single-mode performance). For all 
practical purposes this is not so bad at all (contrast (11), 
having geometric inferiority). 

Analysis of the multi-mode case envisaged in [4], is 
far more challenging. One must then account for the 
problem of optimally choosing the partition TV = nm. 
Recently, Lane et al.[8] showed, via exhaustive Monte- 
Carlo simulations of a maximum likelihood data analysis 



scheme, that the effective multi-mode error scaling law is 
O(l/iV0-85) for an optimized SSW-partition. This is less 
than the 0(1/ N'^) Shapiro et al. had hoped for (and still 
inferior to squeezed-state intcrferometry), but it shows 
that such avenues must be closed. 

On these grounds, we advocate the maximization of (8) 
as a robust variational problem. Previously, the Fisher 
information [21] was used as the optimality criterion [8] 
(since that is the key tool in the analysis of variance for 
maximum likelihood mcthods[22, 23]). However, recent 
work in the information theoretic asymptotics of Bayes 
methods [24] has shown that the Fisher and Bayes meth- 
ods arc essentially equivalent for uniform prior in the 
large m regime. Of course, only there is the theorem of 
Fisher valid anyway [2 1-23]. 

Thus we expect the two variational problems will be 
asymptotically equivalent. Further, as we will see, the 
criterion (8) suggests the existence of multi-mode bounds 
analogous to Hall's single-mode bound given at Eq. (13). 

The rationale for preferring (8) in this aim is as follows. 
A key difficulty in the maximum likelihood analysis [8], is 
the huge computational cost posed by the open-ended 
multi-mode data set {0i, (/)2, . . .}. The optimal division 
of pulse energy is unknown a priori. 

Further, in the multi-mode problem we must allow for 
any possible statistical behaviour, for both the large m 
limit, and the case where m = 0(1) (where the general 
expectation seems to be that the optimal result occurs 
for m = 1). This is problematic because one then needs 
corrections to the Fisher result, arising from the higher 
order asymptotics of the central limit thcorcm[23]. 

Statistical methods to locate the transitional regime to 
the asymptotic normality predicted by Fisher [21] have 
been developed by Braunstciu'^23]. While this is very 
useful to estimate the true performance of a multi-mode 
scheme [8], one must employ Monte-Carlo simulations to 
verify the domain of applicability anyway. 

The new approach we advocate is to recognize that the 
multi-mode performance is limited by the "best possible" 
statistical event (irrespective of how likely it is; i.e. we 
do not care if it is rare). 

Examining Eq. (8) we see that (AI(7V)) is bounded 
above by the posterior distribution p($ | </>!,..., </>„) of 
greatest information (i.e. we replace p(0i, .... (f>m) in (8) 
by a delta function centered on this datum) . It is perhaps 
intuitively clear (see later) that this is generated by the 
(very unlikely) identical data string [25] 

{<f>l,<p2,---,(pm.} = {(l>,<l>,---,(p}, 

since this is the "most peaked" possible product of m 
single mode functions. In the case of a uniform prior 
we can leave </> arbitrary, since the information is then 
independent of ^. Specifically, we choose 

PAr(4>|</.i, </.2, . . . , 0„) = Af{m)-'[pn{mr, (14) 
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with (j) arbitrary, where the normahzation is 

J — 7r 

and is the single-mode detection function. 

Recently, we proved a theorem [26] (under very general 
conditions that go beyond the present application) that 
the average multi-mode performance of a fixed single- 
mode function is limited by the "best-case" result of m 
identical data. This theorem implies that 

(15) 

for the case of a uniform prior. Significantly, wc do not 
need the central limit theorem to show this (it follows 
from a convexity argument for any convex optimality 
measure, i.e. not just information[25, 26]). 

Thus, to locate an absolute multi-mode performance 
bound, for all m (both large and small), we need only 
study this single integral. Although the true average 
performance must include a statistical analysis of all the 
outcomes, and their likelihood, we see that the setting of 
upper bounds does not require this. 

This is most helpful if, by the analysis of bounds, we 
can show that the multi-mode scheme cannot realize any 
useful performance increase. This is the expected result 
after the work of Lane et al. [8] . 

The multi-mode problem thus becomes clearer, and a 
resolution of the issue is perhaps within sight. One would 
like to extend (13) so as to limit the Shannon information 
realized by an arbitrary product function (14), where n 
is subject to the usual constraint N = mn. Although it 
remains difficult, this problem is more tractable than the 
maximum likelihood analysis, and may well be amenable 
to a direct analytical assault. 

While a solution is always preferable to a bound, the 
"bounding strategy" appears to be the fastest route to 
discover if multi-mode schemes are worth it. This route 
offers hope that we can avoid the central limit theorem 
corrections needed in maximum likelihood analysis[23]. 

In conclusion, the optimal phase measurement problem 
provides a challenge to the standard methods based upon 
analysis of variance. If, as in this case, all conceivable 
statistical functions are candidates in principle [10] new 
robust methods seem essential. 

This work was sponsored by the Australian Research 
Coimcil and was largely completed some twenty years 
ago. However, at that time, Baycsian methods were 
not widely understood and certainly not accepted within 
the physics comrnvmity. The variational problem posed 
herein remains unsolved to this day. 
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